#### McStas McXtrace





Team



Peter Willendrup DTU / ESS DMSC



Mads Bertelsen ESS DMSC



Gregory S Tucker ESS DMSC



Emmanuel Farhi Synchrotron SOLEIL



Tobias Weber Institut Laue-Langevin



José Robledo FZ Jülich / IAS / JSC

**Team Mentors** 



Jan-Oliver Mirus FZ Jülich / IAS / JSC



Ilya Zhukov FZ Jülich / IAS / JSC





# **Profiler Output**

We are likely blocked by our "arrays of struct"

Too many registers required pr. thread



# **Profiler Output**

We are likely blocked by our "arrays of struct"

Even the 'simplest of all cases' is not great in this respect/



### Progress and Goals

- Tobias identified and corrected a thread race condition in a specific instrument / component
- Emmanuel tried chatbot hints for potential missing #pragma's in key component codes... "Not so easy" ;-)
- Goals for today:
  - More profiling
  - Experiments with 'struct with arrays' approach
  - Limit device-host-device transfers
     (only some data need to be loaded back to host)
  - Take a look at alternative FUNNEL sorting algorithm (something readily provided in CUDA?)







#### **Problems and Solutions**

- Refactoring to try out a new overall code structure is challenging
  - Hack local c-file is error prone vs. lots of work needed to 'fully integrate'
  - Limit device-host-device transfers

     (only some data need to be loaded back to host)



